2024-02-04

What is a time series?

  • A time series is a group of observations of a data point that are tracked over time
    • Trend/Cycle
    • Seasonal
    • Remainder
  • This information can be tracked at several levels of time (e.g. annual, quarterly, monthly, etc.)
  • Economic measures are generally the most common metrics tracked and discussed, but any area with relevant data points can be analyzed

Why is Time Series Decomposition relevant?

  • It is helpful to understand the patterns that make up trends/cycles vs seasonal behaviors.
  • Different modeling techniques can be applied based on the knowledge of how the primary components interact with one another.
  • If you are interested in making inferences on this data, it is a good starting point to get a better understanding of these elements

Our Dataset

## # A tsibble: 6 x 5 [1h] <UTC>
## # Key:       start_station_name, rideable_type, member_casual [6]
## # Groups:    start_station_name, rideable_type @ start_time [6]
##   start_time          start_station_name rideable_type member_casual
##   <dttm>              <chr>              <chr>         <chr>        
## 1 2023-09-12 08:00:00 1 Ave & E 30 St    classic_bike  member       
## 2 2023-06-22 00:00:00 1 Ave & E 62 St    classic_bike  member       
## 3 2023-05-25 08:00:00 1 Ave & E 68 St    classic_bike  member       
## 4 2023-05-26 11:00:00 10 Ave & W 28 St   classic_bike  member       
## 5 2023-10-18 18:00:00 11 Ave & W 27 St   classic_bike  member       
## 6 2023-06-11 15:00:00 11 Ave & W 59 St   classic_bike  member       
## # ℹ 1 more variable: number_of_rides <int>

Viewing our Data

It’s always a good idea to plot your time series prior to any sort of decomposition

Plotting Seasonality

Below is the seasonal plot of our data for a daily period. We see a higher number of rides around 5-6pm, when many people are commuting.

Lag Plots

  • Lag plots can be an effective way to display autocorrelation and seasonal cycles within your TS

Lag Plots (cont.)

Below we see a plot of weekly lags. We don’t see a ton of structure here indicating weaker relationships between a value and its lagged values

Time Series Components

  • Our example time-series data has two seasonal periods: daily and weekly
  • We can plot the different components of our decomposed series via the autoplot function

Time Series Components

  • We can also seasonally-adjust our data easily in the fabletools package

Viewing Daily Rides by Membership

Follow Up

Membership Type
Type Date Max Daily Rides Total Rides
member 2023-09-12 3160 731783
casual 2023-07-22 1722 257068

Moving Averages

  • Useful for “smoothing” data to reduce seasonal fluctuations and discover the trend-cycle.

\[ T_t = \sum_{j=-k}^{k} y_{t+j} \]

\[k = \frac {(m-1)}{2}\]

\[a_j = a_{-j}\]

Calculating for 01/04/2023

\[7-MA = \sum_{j=-3}^{3} y_{t+j}\]

member_casual datetime total_member_rides 7-MA
casual 2023-01-01 732 NA
casual 2023-01-02 582 NA
casual 2023-01-03 265 NA
casual 2023-01-04 412 497.4286
casual 2023-01-05 418 456.0000
casual 2023-01-06 405 419.4286
casual 2023-01-07 668 425.4286

Calculating for 12/28/2023

member_casual datetime total_member_rides 7-MA
casual 2023-12-25 207 277.2857
casual 2023-12-26 247 281.0000
casual 2023-12-27 227 290.7143
casual 2023-12-28 286 325.0000
casual 2023-12-29 472 NA
casual 2023-12-30 407 NA
casual 2023-12-31 429 NA

Moving Average is Even Ordered

Membership Type
Type Hour Max Daily Rides Total Rides
member 2023-09-12 18:00:00 393 731783
casual 2023-07-04 22:00:00 227 257068

Calculating for 01/01/2023 12:00:00 PM

member_casual datetime total_member_rides 24-MA 2x24-MA
casual 2023-01-01 07:00:00 5 NA NA
casual 2023-01-01 08:00:00 4 NA NA
casual 2023-01-01 09:00:00 20 NA NA
casual 2023-01-01 10:00:00 21 NA NA
casual 2023-01-01 11:00:00 47 30.50000 NA
casual 2023-01-01 12:00:00 51 30.41667 30.45833
casual 2023-01-01 13:00:00 70 28.29167 29.35417
casual 2023-01-01 14:00:00 58 27.37500 27.83333
casual 2023-01-01 15:00:00 66 26.91667 27.14583

Classical Decomposition Methods

  • Additive Decomposition: \(Y = S + T + C\)
    • Calculate Trend-Cycle Component \(T_t\) = 2 x m-MA (even) or m-MA (odd)
    • Detrended Series = \(y_t - T_t\)
    • Seasonal Series = Average of the seasonal component (month, quarter, etc)
    • Remainder (Irregular) Series = \(R_t = y_t - T_t - S_t\)

  • Multiplicative: \(Y = S * T * C\)
    • Calculate Trend-Cycle Component \(T_t\) = 2 x m-MA (even) or m-MA (odd)
    • Detrended Series = \(\frac{y_t}{T_t}\)
    • Seasonal Series = Average of the seasonal component (month, quarter, etc)
    • Remainder (Irregular) Series = \(R_t = \frac{y_t}{T_t * S_t}\)

Classical Decomposition Drawbacks

  • The estimated trend-cycle is missing for the first and last few observations
  • The trend cycle can be smoothed too much with large, fast peaks and valleys
  • We assume the seasonality holds true between each year
  • It is not robust to valid outliers such as natural disasters or economic changes.

STL Decomposition

  • Seasonal and Trend Decomposition using LOESS
  • Use trend and season parameters to control how rapidly seasonality and trend change

Pros and Cons of STL Decomposition

Pros

  • Can handle different types of seasonality well (not just monthly/quarterly)
  • User has a lot of control over parameters (smoothness, windowing, etc)
  • Robust to outlier values

Cons

Alternative Approaches

  • Since economists spend most of their time analyzing time series data, they have come up with additional methods
    • X11/SEATS
      • Primarily handles monthly and quarterly data, but allows for incorporation of holidays.
      • Variation of the seasonal period is allowed
    • DSA
      • German Bank established this method to handle daily data
      • Combines STL on multiple seasonal components and ARIMA for additional handling of outliers and calendar adjustments

Sources Used